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Abstract: The paper proposes some robust estimators of the finite popula- 
tion mean. Such estimators are particularly suitable in the presence of some 
outlying observations. Included as special cases of our general result are ro- 
bust versions of the ratio estimator and the Horvitz-Thompson estimator. The 
robust estimators are derived on the basis of certain predictive influence func- 
tions. 



1. Introduction 

It is indeed a pleasure and privilege to contribute to this Festschrift honoring Pro- 
fessor P. K. Sen, a man whom I have long cherished as my friend, guide and philoso- 
pher. Professor Sen, for nearly a period of five decades, has made many profound 
contributions to the discipline of statistics. His research has encompassed every sin- 
gle area of statistical inference — parametric, semiparametric and nonparametric, 
and the theory that he has developed has found applications in many diverse areas 
of science. Indeed, he is one of the rare individuals in our profession who cannot 
just be identified with one localized area of statistics. The versatility of his research 
transcends any single narrowly focused topic, and the whole is by far bigger than 
the sum of the parts. 

One of the many areas of interest of Professor Sen is the robustness of statistical 
procedures. His multiple authored or coauthored articles on the topic are very well 
summarized and unified in his 1996 classic treatise with Jureckova. The book pro- 
vides a very comprehensive account of the subject with a fully developed asymptotic 
theory. 

In this note, I will consider the robustness issue from a Bayesian perspective 
in the context of finite population sampling. Although, written within a Bayesian 
framework, the proposed estimators can also be viewed entirely from a model- 
based perspective. We introduce the notion of "predictive influence functions" as 
introduced by Johnson and Geisser [15, 16, 17], and obtain robust alternatives to 
a general class of Bayesian model-based estimators of the finite population mean, 
which includes in particular robust alternatives to the popular ratio estimators as 
well as Horvitz-Thompson estimators. 

Section 2 of this paper introduces the concept of "predictive influence functions" 
based on a general divergence measure as introduced for example in Amari [1] 
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and Cressie and Read [5]. The general divergence measure includes as a special 
case both the Kullback-Leibler and Hellinger divergence measures. The concept of 
"predictive influence functions" was first used by Johnson and Geisser [15, 16, 17] in 
a Bayesian context based specifically on the Kullback-Leibler divergence measure. 
Based on these influence functions, we have developed in this section, a general 
class of robust Bayesian estimators of the finite population mean. As special cases, 
we have found robust Bayesian alternatives to the ratio estimator as well as the 
Horvitz- Thompson estimator. In Section 3, we have obtained the mean squared 
error of these robust Baycs estimators purely from a frequentist criterion. Some 
final remarks are made in Section 4. 

Influence functions have a long and rich history in the statistics literature. Their 
importance in robust estimation is well-emphasized in Hampel [12], Huber [14], 
Hampcl ct al. [13] and many related papers. The predominant idea is to detect 
influential observations in terms of their effects on parameters, most typically the 
regression parameters. We have taken instead the predictive point of view. In finite 
population sampling, where the primary goal is to predict the "unseen" from the 
"seen" , such a predictive approach seems quite natural. However, we have been 
able to point out a close connection between the proposed influence function, and 
the ones considered by Hampel, Huber and others. The main point is to control 
the effect of outlying observations for inference in finite population sampling. We 
have pointed out also the connection of the proposed robust ratio estimators to the 
corresponding estimators of Chambers [4] and Gwet and Rivest [10]. 

2. Development of estimators 

Consider a finite population with units labeled 1, 2, . . . , N. A subset s of {1, 2, . . . , 
A} is referred to as a sample. For simplicity, we consider only samples of fixed size 
n. Let yi, . ■ ■ ,Un denote the characteristics of interest associated with the N units 
in the population. Consider the hierarchical Bayesian model where conditional on 
9, yi ~ N(0a,i, erf), 1, 2, . . . , TV, where the a,; and of are known constants, while the 
unknown 9 has a uniform distribution on the real line. Without loss of generality, let 
s = {1,2,..., re}. Also, let y w = ^Li a ^~ 2 W SLi a ? CT i~ 2 > a s = (ai, ■ ■ . , a„) T , 
a u = (a n+ i,a n+ 2,---,a N ), y s = (yi, . . . , y n ) T , y u = (y n +i, y n +2, ■ ■ ■ , Vn) t and 
£. u = diag(of +1 , . . . : <j 2 n ). It is shown in Ghosh and Sinha [9] that the posterior 
distribution of 9 given y s is N(y w , 1/ Y17=i a i a i~ 2 )> an d the posterior predictive dis- 
tribution of y u given y s is N(y 

+ o««u/ E"=i a i a i 2 )- :t is also shown that 
with the given model, the estimator of the finite population mean yp = N -1 y^ --, j/j 
is given by 



In particular, if = Xi, af = cr 2 Xi, i = 1, . . . ,n, then the resulting estimator of the 
finite population mean is the ratio estimator (y s /x s )xp, where y s = n~ Xs }™ =x yi, 

x s = n~ 1 Y^i = \Xi, an d Xp = A^ 1 ^^:^. The choice a 2 ; = Xi and of = x\ 
leads to the estimator N ^Jl =l yi + n~ 1 Y^i = i(Ui/xi)], an estimator introduced 
in Royall [19] and considered at length in Basu [2]. Finally, the choice ai = Wi 
and of = 7Tj/(l — TTi), where Hi > for all % = 1, . . . , N and J2i=i ^ = 71 leads 
to the Horvitz-Thompson estimator A -1 X)"=i(yi/ 7r ») f° r VP- ^ IS instructive to 



n 



N 



(1) 
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view the estimator y w of 9 as a weighted average of the estimators yi/ai of 6, 
the weights being proportional to aj/af, the inverses of the variances of these 
estimators. However, the estimators yi/ai are not necessarily outlier resistant. In 
particular, some of the components yi/ai could be substantially different from the 
grand average y w . This may happen when for instance, a particular ratio yi/ai is 
substantially smaller in magnitude than the other ratios j/j / aj , but its variance 
of/af is much larger relative to the other variances (Xj/aj, (j ^ i). Since the 
estimators yi/ai of 9 are weighted inversely proportional to their variances in y w , 
the effect of this particular yi/ai can be very insignificant compared to the other 
Vj/ a j U 7^ *)' m finding y w . In order to control such outlying observations, as in 
Hampel [12], we first study the influence of yi/ai. To this end, we bring in the notion 
of "predictive influence functions" as introduced by Johnson and Geisser [15, 16, 17] 
based on the Kullback-Leibler (K-L) divergence measure. The influence function as 
considered here, however, is based on a general divergence measure introduced by 
Amari [1] and Cressie and Read [5]. This measure includes, but is not limited to 
the K-L or Bhattacharyya-Hellinger (B-H) (Bhattacharyya [3] and Hcllinger [11]) 
divergence measure. For two densities fi and / 2 , this general divergence measure 
is given by 



(2) DxUUa) = -^— Efl 



The above divergence measure should be interpreted as its limiting value when 
A — > or A — > — 1. We may note that D\(fi,f 2 ) is not necessarily symmetric in 
fi and f 2 , but the symmetry can always be achieved by considering ^[Dx(fi, f 2 ) + 
D\(fz, fi)\. Also, it may be noted that as A — > 0, D\(fi, f 2 ) —* Ef 1 [log jX], while if 
A — ► —l,D\(fi,f2) — > Ef 2 [\ogj^]. These are the two K-L divergence measures. 
Also D_i(/i,/ a ) = 4(1 - / VJ7/2) = 2H\hJ 2 \ where H(f u f 2 ) = {2(1 - 
/ V/1/2)} 1 ^ 2 , the B-H divergence measure. In the present context, we consider 
the divergence between the posterior predictive distribution of y u given y s and 
the posterior predictive distribution of y u given y s with one of the y%, say, j/fe, 
k = 1, . . . , n removed. To this end, we first state a general divergence result involv- 
ing two multivariate normal distributions based on the general divergence measure 
as given in (2). The result is proved in Ghosh, Mergel and Datta (2006). 

Theorem 1. Let fi and denote the A p (/x l7 Si) and A p (/x 2 ,S2) pdf's respec- 
tively. Then 

Dx(h,f2) = X(XTT) [cxp{ ^f^ (/ " 1_At2)T 

x ((1 + A)S 2 — ASi) _1 (/i 1 - /lz 2 )} 
x |S 1 |-^|S 2 |- A T i |(l + A)S 2 -AS 1 |* -1]. 

It follows from the above general result that the divergence between two nor- 
mal distributions is a quadratic function of the difference of the two mean vec- 
tors. In the present context the difference in the mean vectors of the two pos- 
terior predictive distributions of y u turns out to be a multiple of the square of 

y w — ^ftj; ' 3 V _\ — \ k _2 L , which on simplification reduces to a known multiple of 

the square of yk/ ak~y w - Thus, one needs to control the residuals yu/ ak~y w for find- 
ing robust estimators of the finite population mean yp. However, in order to make 
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these residuals scale- free, we consider the standardized residuals (yk/ak — yw)/vk, 
where v 2 = V{y k /a k - y w ) = K^ 2 )" 1 - (£™ =1 a 2 a~ 2 )-\ 

It is instructive to find a connection between the proposed method of finding the 
influence functions, and those that are widely used in the robust statistics literature. 
Following the approach of Hampel [12], the influence function (IF) of the functional 
T at distribution function F is given by 

W( X ; T,F) =]im r « 1 -^ + ^)-^) ; 
v ; no t 

for x € X where this limit exists. Here T(F) is the parameter of interest, and S x 
is the dirac-dclta function. Thus when the parameter of interest is the population 
mean, namely T(F) = J xdF, X ~ F, writing F t = (1 - t)F + tS Xo , 

T{F t ) = J xd[(l - t)F + tS XQ ] = (1-t) J xdF + tx . 

for x d X where this limit exists. Here T(F) is the parameter of interest, and S x 
is the dirac-dclta function. Thus when the parameter of interest is the population 
mean, namely T(F) = J xdF, X ~ F, writing F t = (1 - t)F + tS Xo , 

T(F t ) = J xd[(l - t)F + tS Xo ] = (1 - t) J xdF + tx . 

Hence, IF(xq;T, F) = x — 9. In our case, conditional on 0, E(y k /ak) = 0, and the 
natural estimator of 6 is y w . Hence, we estimate yk/ak — 9 by yk/a,k — Vw However, 
it is more appropriate to consider the scale- free residuals = (yk/ak — y w )/vk- 

Based on these scale-free residuals, and writing Wi = afa^ 2 / Y17=i a i cr 7 2 ^ * = 
1, . . . , n, we propose the robust estimator 

n 

(3) Or = y w + ^w i Vi[r i I [ \ ri \< c] +CI [ri>C ] + (-C)7[ n< _ C ]]. 

i=i 

Consequently, the proposed robust estimator of the finite population mean yp is 
given by 



(4) y^=N-^y t + R £ 



N 



Remark 1. The proposed estimator of or of yp is similar in spirit to the "limited 
translation estimator" of Efron and Morris [6, 7]. However, the present motivation 
of these estimators from the predictive influence function point of view is entirely 
new. We will address the question of choice of the constant C in the next section. 

Remark 2. In the special case when cij = Hi and of = ir 2 / (1 — iri), ir > and 
Eili^i = n > w i = ( 1 -""i)/E"=i( 1 - 7! "i)) and v i = ( 1 - 7I "i)~ 1 -(Ei=i( 1 - 7I 'i))~ 1 - 

— ( R) — 

The resulting estimator y p oi yp is & robust alternative of the celebrated Horvitz- 
Thompson estimator. 

Remark 3. Next in the case when a, = Xi and a 2 = <r 2 h(xi), it follows that 
Wi = [x 2 /h(xi)]/ Y^ = \{ x2 /h(xi)]. In this case, our estimator is similar to the one 
of Chambers [4] except that Chambers used rather than Uj as the scaling factor. 
To sec the difference between the two estimators in the special case of robust alter- 
natives to the ratio estimator, that is, where a, = x% and erf = a 2 Xi, the proposed 
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scaling factor vi simplifies to a[x i 1 — E"=i Xi ) 1 ] 1/ ' 2 - I n contrast, Chamber's scal- 

— 1/2 

ing factor is just ax i . One interesting feature is that we bring in the notion of 
influence functions in the derivation of robust estimators in finite population sam- 
pling which could be potentially useful as a general approach to robust estimation 
in the model-based analysis of more complex surveys as well. 

In the next section, we will find the frcqucntist mean squared error (that is 
conditional on 9) of yp. 

3. Mean squared error 

We first find an expression for the mean squared error (MSE) of yp^ as an estimator 
of yp. In the process, we have also made some observations about the choice of the 
constant C . To this we first prove the following theorem. 

Theorem 2. Under the assumed model, conditional on 9, 
(5) 

N n n N 

E{y ( p R) ~y P f=N-*[Y: °* + {(£ W 1 + (£ £ a,) 2 ], 

j— n+1 i—1 i=X j — n+1 

where $ and (j) denote respectively the N(0,1) df and pdf and 

(6) g(C) = 2[(C 2 + l)<S>(-C)-2Cq>(C)}. 

Remark 4. It may be noted that under the assumed model, the MSE of Yp, 
the posterior mean of Yp, also the best unbiased predictor (best linear unbiased 
predictor without normality), is given by iV~ 2 E J=n+1 ffj + (J27=i a i a 7 ) _1 x 
(Ej= n +i a i) 2 ]- Thus, if the assumed model is true, the excess risk of the pro- 
posed robust estimator is given by N~ 2 (J™ =1 wjvf)g(C)(J2f= n +i a if '■ Noting that 
g'(C) = 2[C$(-C) - <j)(C)} < (Feller [8], page 166), it follows that g(C) is de- 
creasing in C. This is intuitively expected since larger the value of C closer On is to 
9. The constant C will be chosen by setting an upper bound, say, M to this excess 
risk, and then solving C numerically by equating this excess risk to M. The choice 
of M will be clearly left to the experimenter. The main idea is to seek a tradeoff 
between robustness against model failure and the maximum excess risk that one is 
willing to tolerate by proposing this robust estimator when the assumed model is 
true. 

Proof of Theorem 2. Throughout, the calculations arc done conditional on 9. First 
from the independence of the yi, i — 1,...,N, for fixed 9, it follows that 

N N 

(7) E{yP - y P f = £ o) + ( £ a 3 fE(9 R Of]. 

j=n+l j=n+l 

Next noting that Y^i=i WiVi r i = TJw — Vw = 0, from (3), one can alternately write 
Or as 



n 

(8) 9r = y w - ^WiViKn - C)i [ri>c] + (n + C)I [r . <_<-]]• 

i=l 
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Next due to the independence of the j/j, Cov(yi/ai — y w ,yw) = (zC"=i a f <7 T 2 ) 1 ~~ 
(X)ILi ^l^i" = 0- Hence, because of normality, yi/ai — y w , and accordingly 7*j 
is distributed independently of y w . Accordingly, it follows from (8) that 

n 

(9) E(§ R - ef = v(y w ) + E{Y, wMin - C)i [n>c] + {n + c)i [ri< _ c] }} 2 . 

i=l 

Since n ~ N(0, 1), it follows after some algebra that 

(10) E[(n - C)I [ri>c] + (n + C)/ [n< _ c] ] 2 = 2[(C 2 + 1)H-C) - 2Cct>(C)]. 

Next, by the fact that for i ^ fc, (n, r k ) = (n, -r k ) = (-r,, r k ) = (-r,, -r fe ), (where 
= signifies "has the same distribution as"), one gets 

E[{( n - C)/ [n>c] + (n + C)I [ri< ^ c] }{(r k - C)/ [rfe>c] + (r k + C)I [rk< _ C ]}] 
= E[{(n - C)(r k - C) - (r t - C)(r k - C) 

(11) -in - C){r k - C) + ( n - c){r k - C)}i ln>c]Ilrk>c] ] = 0. 

The theorem follows now from (7) and (9)-(ll). □ 
4. Summary and conclusion 

The paper proposes some robust estimators which can guard against outlying obser- 
vations in connection with model-based inference in finite population sampling. In 
the process, new robust alternatives to the ratio estimators as well as the Horvitz- 
Thompson estimator are found. The mean squared errors of these model-based 
estimators are also obtained. Future work will encompass extension of these ideas 
to more complex surveys, for example in multistage stratified sampling, and also to 
address situations when there is wide departure from the assumed model. 

Acknowledgments. Thanks are due to a referee for constructive comments. 
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